Can you predict the likelyhood a car will fully-stop at a stop sign based on the type of car and color?

Why this question?

I chose to investigate this question because I’ve always been intrigued by traffic patterns, specifically the inefficiencies of modern driving. This - although seemingly basic - study was aimed at illustrating the vast number of vehicles that speed through the Eagle Row stop signs, a severe safety hazard for students and faculty alike. Eagle Row, with a speed limit of just 25 miles per hour, consistently sees vehicles racing at speeds of up to 50 miles per hour creating an unsafe environment for the Emory Community.

Recording basic information about vehicles, such as type (SUV or Sedan), color (Light or Dark) and whether they fully stopped at the stop sign were chosen to see give students better information about whether crossing the road is safe basic on a few, simple parameters. Essentially, the study is designed to enlighten students about the probability of being struck by a vehicle that doesn’t fully stop at the stop sign.

Data Collection

In order to collect data, I positioned myself on the corner of Eagle Row and Means Drive. This intersection was chosen because of the high volume of traffic. I also was able to stay warm inside the comfort of Kaldi’s Coffee while conducting my expirement.

For each vehicle, I recorded the type (SUV or Sedan), the color (Light or Dark), and whether the vehicle fully stopped at the stop sign in an excel document. I also coded these variables into 0’s and 1’s for regression analysis (to be completed later). Light colored vehicles were given 1’s and dark colored 0’s. Sedan’s were given 0’s and SUV’s 1’s. Finally, fully stopped vehicles were given 1’s, while vehicles that didn’t come to a full stop were given 0’s. I then converted this document into a .csv for analysis in R.

**Figure 1.** Intersection where data was collected [*(Source: Google StreetView)*](https://raw.githubusercontent.com/Hueyoneil/qtm150_final/master/Eagle_Row.png)

Figure 1. Intersection where data was collected (Source: Google StreetView)

Overview of Dataset

The dataset consists of 84 observations from approximately 30 minutes of observing vehicles at 1PM during a Sunday.

knitr::opts_chunk$set(echo = TRUE)
Data <- read.csv(file="https://raw.githubusercontent.com/Hueyoneil/qtm150_final/master/Stop_data.csv", header=TRUE, sep=",")
summary(Data)
##    Color     Color_Coded       Type      Type_Coded      Stop   
##  Dark :52   Min.   :0.000   Sedan:56   Min.   :0.0000   No :60  
##  Light:32   1st Qu.:0.000   SUV  :28   1st Qu.:0.0000   Yes:24  
##             Median :0.000              Median :0.0000           
##             Mean   :0.381              Mean   :0.3333           
##             3rd Qu.:1.000              3rd Qu.:1.0000           
##             Max.   :1.000              Max.   :1.0000           
##    Stop_Coded        X             X.1            X.2         
##  Min.   :0.0000   Mode:logical   Mode:logical   Mode:logical  
##  1st Qu.:0.0000   NA's:84        NA's:84        NA's:84       
##  Median :0.0000                                               
##  Mean   :0.2857                                               
##  3rd Qu.:1.0000                                               
##  Max.   :1.0000                                               
##    X.3            X.4         
##  Mode:logical   Mode:logical  
##  NA's:84        NA's:84       
##                               
##                               
##                               
## 

Visualizing the Dataset

Pie Charts

par(mfrow=c(1,3)) 
pie1 <- c(sum(Data$Color == "Light")/nrow(Data),sum(Data$Color == "Dark")/nrow(Data))
label1 <- c("Light", "Dark")
pct <- round(pie1/sum(pie1)*100)
label1 <- paste(label1, pct)
label1 <- paste(label1, "%", sep=" ")
pie(pie1, labels = label1, main="Color of Vehicle")

pie2 <- c(sum(Data$Type == "Sedan")/nrow(Data),sum(Data$Type == "SUV")/nrow(Data))
label2 <- c("Sedan", "SUV")
pct <- round(pie2/sum(pie2)*100)
label2 <- paste(label2, pct)
label2 <- paste(label2, "%", sep=" ")
pie(pie2, labels = label2, main="Type of Vehicle")

pie3 <- c(sum(Data$Stop == "Yes")/nrow(Data),sum(Data$Stop == "No")/nrow(Data))
label3 <- c("Yes", "No")
pct <- round(pie3/sum(pie3)*100)
label3 <- paste(label3, pct)
label3 <- paste(label3, "%", sep=" ")
pie(pie3, labels = label3, main="Fully-Stopped at Stop Sign")

Bar Graphs

par(mfrow=c(1,3)) 

barplot(table(Data$Color), main="Vehicle Color", ylim=c(0,60))
barplot(table(Data$Type),  main="Vehicle Type", ylim=c(0,60))
barplot(table(Data$Stop),  main="Full-Stop", ylim=c(0,60))

Predicting the Outcome

To predict whether a car will stop at the stop sign, I used my two independent variables (type of vehicle and color of vehicle) to predict whether the vehicle would fully-stop at the stop sign. Although this was a relatively small sample size and presumably a non-causal relationship, I was still able to find a correlation between my variables in relation to whether the vehicle would fully-stop.

For my first regression, I inluded both variables to predict the outcome. After analyzing the results, as seen below, I determined that the type of vehicle had nearly no effect on the outcome as the confidence interval ranged between (-0.219, 0.169) for the type of car.

fit <- lm(Data$Stop_Coded ~ Data$Type_Coded + Data$Color_Coded, data=Data)
stargazer(fit, type="html", dep.var.labels=c("Likelyhood to Run Stop Sign"), covariate.labels=c("Type of Vehicle","Color of Vehicle"), ci = TRUE)
Dependent variable:
Likelyhood to Run Stop Sign
Type of Vehicle -0.025
(-0.219, 0.169)
Color of Vehicle 0.348***
(0.159, 0.536)
Constant 0.161**
(0.031, 0.292)
Observations 84
R2 0.139
Adjusted R2 0.118
Residual Std. Error 0.427 (df = 81)
F Statistic 6.545*** (df = 2; 81)
Note: p<0.1; p<0.05; p<0.01

Best Predictor

I used model 2 to best predict the outcome based solely on the color of the vehicle. I found that lighter colored cars were between (0.159, 0.533) –using a 95% confidence interval– more likely to fully-stop at the stop sign.

fit1 <- lm(Data$Stop_Coded ~ Data$Color_Coded, data=Data)

stargazer(fit, fit1, type="html", dep.var.labels=c("Likelyhood to Run Stop Sign"), covariate.labels=c("Type of Vehicle","Color of Vehicle"), ci = TRUE)
Dependent variable:
Likelyhood to Run Stop Sign
(1) (2)
Type of Vehicle -0.025
(-0.219, 0.169)
Color of Vehicle 0.348*** 0.346***
(0.159, 0.536) (0.159, 0.533)
Constant 0.161** 0.154**
(0.031, 0.292) (0.038, 0.269)
Observations 84 84
R2 0.139 0.138
Adjusted R2 0.118 0.128
Residual Std. Error 0.427 (df = 81) 0.424 (df = 82)
F Statistic 6.545*** (df = 2; 81) 13.179*** (df = 1; 82)
Note: p<0.1; p<0.05; p<0.01

Conclusion

After examining my data, I was able to determine that the type of car, whether SUV or Sedan, has no correlation on whether the car stops or keeps driving at a stop sign. However, the color of the car does have an impact, and from my data, I found that there is a correlation between the color of the vehicle and the likelyhood it stops at a stop sign.